97 research outputs found

    Identifiability and transportability in dynamic causal networks

    Get PDF
    In this paper we propose a causal analog to the purely observational Dynamic Bayesian Networks, which we call Dynamic Causal Networks. We provide a sound and complete algorithm for identification of Dynamic Causal Networks, namely, for computing the effect of an intervention or experiment, based on passive observations only, whenever possible. We note the existence of two types of confounder variables that affect in substantially different ways the identification procedures, a distinction with no analog in either Dynamic Bayesian Networks or standard causal graphs. We further propose a procedure for the transportability of causal effects in Dynamic Causal Network settings, where the result of causal experiments in a source domain may be used for the identification of causal effects in a target domain.Preprin

    Automated construction and analysis of political networks via open government and media sources

    Get PDF
    We present a tool to generate real world political networks from user provided lists of politicians and news sites. Additional output includes visualizations, interactive tools and maps that allow a user to better understand the politicians and their surrounding environments as portrayed by the media. As a case study, we construct a comprehensive list of current Texas politicians, select news sites that convey a spectrum of political viewpoints covering Texas politics, and examine the results. We propose a ”Combined” co-occurrence distance metric to better reflect the relationship between two entities. A topic modeling technique is also proposed as a novel, automated way of labeling communities that exist within a politician’s ”extended” network.Peer ReviewedPostprint (author's final draft

    Learning definite Horn formulas from closure queries

    Get PDF
    A definite Horn theory is a set of n-dimensional Boolean vectors whose characteristic function is expressible as a definite Horn formula, that is, as conjunction of definite Horn clauses. The class of definite Horn theories is known to be learnable under different query learning settings, such as learning from membership and equivalence queries or learning from entailment. We propose yet a different type of query: the closure query. Closure queries are a natural extension of membership queries and also a variant, appropriate in the context of definite Horn formulas, of the so-called correction queries. We present an algorithm that learns conjunctions of definite Horn clauses in polynomial time, using closure and equivalence queries, and show how it relates to the canonical Guigues–Duquenne basis for implicational systems. We also show how the different query models mentioned relate to each other by either showing full-fledged reductions by means of query simulation (where possible), or by showing their connections in the context of particular algorithms that use them for learning definite Horn formulas.Peer ReviewedPostprint (author's final draft

    Classifier selection with permutation tests

    Get PDF
    This work presents a content-based recommender system for machine learning classifier algorithms. Given a new data set, a recommendation of what classifier is likely to perform best is made based on classifier performance over similar known data sets. This similarity is measured according to a data set characterization that includes several state-of-the-art metrics taking into account physical structure, statistics, and information theory. A novelty with respect to prior work is the use of a robust approach based on permutation tests to directly assess whether a given learning algorithm is able to exploit the attributes in a data set to predict class labels, and compare it to the more commonly used F-score metric for evaluating classifier performance. To evaluate our approach, we have conducted an extensive experimentation including 8 of the main machine learning classification methods with varying configurations and 65 binary data sets, leading to over 2331 experiments. Our results show that using the information from the permutation test clearly improves the quality of the recommendations.Peer ReviewedPostprint (author's final draft

    Characterizing transactional databases for frequent itemset mining

    Get PDF
    This paper presents a study of the characteristics of transactional databases used in frequent itemset mining. Such characterizations have typically been used to benchmark and understand the data mining algorithms working on these databases. The aim of our study is to give a picture of how diverse and representative these benchmarking databases are, both in general but also in the context of particular empirical studies found in the literature. Our proposed list of metrics contains many of the existing metrics found in the literature, as well as new ones. Our study shows that our list of metrics is able to capture much of the datasets’ inner complexity and thus provides a good basis for the characterization of transactional datasets. Finally, we provide a set of representative datasets based on our characterization that may be used as a benchmark safely.Both authors have been partially supported by TIN2017-89244-R from MINECO (Spain’s Ministerio de Economia, Industria y Competitividad) and the recognition 2017SGR-856 (MACDA) from AGAUR (Generalitat de Catalunya). Christian Lezcano is supported by Paraguay’s Foreign Postgraduate Scholarship Programme Don Carlos Antonio López (BECAL).Peer ReviewedPostprint (published version

    Synthetic dataset generation with itemset-based generative models

    Get PDF
    This paper proposes three different data generators, tailored to transactional datasets, based on existing itemset-based generative models. All these generators are intuitive and easy to implement and show satisfactory performance. The quality of each generator is assessed by means of three different methods that capture how well the original dataset structure is preserved.Both authors have been partially supported by TIN2017-89244-R from MINECO (Spain’s Ministerio de Economia, Industria y Competitividad) and the recognition 2017SGR-856 (MACDA) from AGAUR (Generalitat de Catalunya). Christian Lezcano is supported by Paraguay’s Foreign Postgraduate Scholarship Programme Don Carlos Antonio López (BECAL).Peer ReviewedPostprint (author's final draft

    Does training affect match performance? A study using data mining and tracking devices

    Get PDF
    FIFA has recently allowed the use of electronic performance and tracking systems (EPTS) in professional football competition, providing teams with novel and more accurate data. Physical performance has not yet taken much attention from the research community, due to the difficulty of accessing this information with the same devices during training and competition. This study provides a methodology based on machine learning and statistical methods to relate the physical performance variation of players during time-framed training sessions, and their performance in the following matches. The analysis is carried out over F.C. Barcelona B, season 2015-2016 data, and makes emphasis on exploiting the design characteristics of the structured training methodology implemented within the club. The use of summarized physical variation data has provided a remarkable relation between higher magnitudes of variation in 3-week time frames during training, and higher physical values in the following matches. With increased data availability this and new approaches could provide a new frontier in physical performance analysis. This is, up to our knowledge, the first study to relate training and matches performance through the same EPTS devices in professional football.Peer ReviewedPostprint (published version

    Estrategias de mejora del rendimiento en una asignatura teórica difícil de la fase selectiva

    Get PDF
    La asignatura de Introducción a la Lógica es una asignatura del primer año de Ingeniería en Informática impartida en la Facultad de Informática de Barcelona (UPC). La asignatura fue rediseñada en el curso 06-07 y pasó a ser de las más fáciles a ser de las más difíciles. Como es de esperar, esta asignatura tiene muy mala reputación entre los alumnos que inician su carrera por su dificultad y por el bajo rendimiento que éstos obtienen. En este artículo se analizan las notas obtenidas durante un total de siete cuatrimestres, agrupadas según la temática de sus contenidos. La asignatura consta de tres bloques temáticos a los que llamaremos T1, T2, y T3. Cada uno de estos bloques cubre contenido de dificultad distinta, y el nivel de exigencia dentro de cada bloque varía también. La observación principal que se puede extraer de nuestro análisis es que la nota media del bloque temático T3 es muy inferior a las de los bloques T1 y T2. Asimismo, se detecta una tendencia al alza de las notas en cada uno de los bloques temáticos. Este artículo pretende identificar las posibles causas del bajo rendimiento de los alumnos dentro del contexto de cada uno de estos tres bloques temáticos. Nuestra conclusión principal es que los contenidos del bloque T3 resultan demasiado avanzados para un alumno de primero, mientras que los bloques T1 y T2 se podrían beneciar de la aplicación de una metodología docente con más seguimiento y actividades tutorizadas.Postprint (author’s final draft

    Semblant cerca semblant?: la formació de grups de treball en la pràctica de la programació

    Get PDF
    En una assignatura del grau d'enginyeria d'informàtica, la pràctica de programació ha passat de ser un treball individual a un treball en equip, en principi per parelles. L'alumnat té llibertat total per formar equips amb una intervenció mínima per part del professorat. L'anàlisi de les parelles formades indica que no hi ha una tendència dels alumnes a associar-se amb alumnes de rendiment semblant, potser perquè paràmetres cognitius generals no regeixen la tria de parella acadèmica. In a course of the degree of computer science, the programming project has changed from individual to teamed work, tentatively in couples (pair programming). Students have full freedom to team up with minimum intervention from professors. The analysis of the couples made indicates that students do not tend associate with students with a similar academic performance, maybe because general cognitive parameters do not govern the choice of academic partners.Peer ReviewedPostprint (published version

    Does like seek like?: the formation of working groups in a programming project

    Get PDF
    In a course of the degree of computer science, the programming project has changed from individual to teamed work, tentatively in couples (pair programming). Students have full freedom to team up with minimum intervention from teachers. The analysis of the couples made indicates that students do not tend to associate with students with a similar academic performance, maybe because general cognitive parameters do not govern the choice of academic partners. Pair programming seems to give great results, so the efforts of future research in this field should focus precisely on how these pairs are formed, underpinning the mechanisms of human social interactionsPeer Reviewe
    corecore